Learning to Translate: A Statistical and Computational Analysis

نویسندگان

  • Marco Turchi
  • Tijl De Bie
  • Cyril Goutte
  • Nello Cristianini
چکیده

We present an extensive experimental study of a Statistical Machine Translation system, Moses [14], from the point of view of its learning capabilities, and we discuss learning-theoretic aspects of these systems, including model selection, representation error, estimation error and hypothesis space. Very accurate Learning Curves are obtained, by using high-performance computing, and extrapolations of the projected performance of the system under different conditions are provided. The experiments show that the representation power of the system is not currently a limitation to its performance, while the inference of its models from finite sets of i.i.d. data is directly responsible for current performance limitations. Of the models, the composition of the translation tables is more important that the numeric estimates of probabilities. The rate of improvement with sample size is no faster than logarithmic, and this is not likely to change with more advanced methods for estimating the numeric parameters. The fundamental limitation to the performance of the system seems to be a direct consequence of the Zipf law governing textual data. A few possible research directions are discussed as a result of this investigation, most notably the integration of linguistic rules into the model inference phase, and the development of active learning procedures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse Structured Principal Component Analysis and Model Learning for Classification and Quality Detection of Rice Grains

In scientific and commercial fields associated with modern agriculture, the categorization of different rice types and determination of its quality is very important. Various image processing algorithms are applied in recent years to detect different agricultural products. The problem of rice classification and quality detection in this paper is presented based on model learning concepts includ...

متن کامل

Sports Result Prediction Based on Machine Learning and Computational Intelligence Approaches: A Survey

In the current world, sports produce considerable statistical information about each player, team, games, and seasons. Traditional sports science believed science to be owned by experts, coaches, team managers, and analyzers. However, sports organizations have recently realized the abundant science available in their data and sought to take advantage of that science through the use of data mini...

متن کامل

Comparative Analysis of Machine Learning Algorithms with Optimization Purposes

The field of optimization and machine learning are increasingly interplayed and optimization in different problems leads to the use of machine learning approaches‎. ‎Machine learning algorithms work in reasonable computational time for specific classes of problems and have important role in extracting knowledge from large amount of data‎. ‎In this paper‎, ‎a methodology has been employed to opt...

متن کامل

On Statistical Query Sampling and NMR Quantum Computing

We introduce a “Statistical Query Sampling” model, in which the goal of an algorithm is to produce an element in a hidden set S ⊆ {0, 1}n with reasonable probability. The algorithm gains information about S through oracle calls (statistical queries), where the algorithm submits a query function g(·) and receives an approximation to Prx∈S [g(x) = 1]. We show how this model is related to NMR quan...

متن کامل

On the Translation Quality of Google Translate: With a Concentration on Adjectives

Translation, whose first traces date back at least to 3000 BC (Newmark, 1988), has always been considered time-consuming and labor-consuming. In view of this, experts have made numerous efforts to develop some mechanical systems which can reduce part of this time and labor. The advancement of computers in the second half of the twentieth century paved the ground for the invention of machine tra...

متن کامل

A METAHEURISTIC-BASED ARTIFICIAL NEURAL NETWORK FOR PLASTIC LIMIT ANALYSIS OF FRAMES

Despite the advantages of the plastic limit analysis of structures, this robust method suffers from some drawbacks such as intense computational cost. Through two recent decades, metaheuristic algorithms have improved the performance of plastic limit analysis, especially in structural problems. Additionally, graph theoretical algorithms have decreased the computational time of the process impre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Adv. Artificial Intellegence

دوره 2012  شماره 

صفحات  -

تاریخ انتشار 2012